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Abstract 

We propose extensions and improvements of the statistical analysis of distributed multipoles 
(SADM) algorithm put forth by Chipot et al in [6] for the derivation of distributed atomic 
multipoles from the quantum-mechanical electrostatic potential. The method is mathematically 
extended to general least-squares problems and provides an alternative approximation method 
in cases where the original least-squares problem is computationally not tractable, either because 
of its ill-poscdness or its high-dimensionality. The solution is approximated employing a Monte 
Carlo method that takes the average of a random variable defined as the solutions of random 
small least-squares problems drawn as subsystems of the original problem. The conditions that 
ensure convergence and consistency of the method are discussed, along with an analysis of the 
computational cost in specific instances. 
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1 Introduction 

In the realm of the molecular modeling of complex chemical systems, atom-centered multipole 
distributions constitute a popular route to simplify the description of intricate electron densi- 
ties. Streamlined down to their most rudimentary representation, these densities are generally 
mimicked in macromolecular force fields by simple point charges, from which, in the context of 
molecular simulations, Coulomb interactions can be rapidly evaluated. Whereas nuclear charges 
are clearly centered onto the constituent atoms, the electron charge distribution extends over 
the entire molecular system. As a result, in sharp contrast with the higher-order multipole 
moments of a neutral molecule, which, strictly speaking, are quantum-mechanical observablcs, 
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atomic charges cannot be defined univocally, in an equally rigorous fashion. They ought to be 
viewed instead as a convenient construct, the purpose of which is to reduce the complexity of 
molecular charge distributions by means of compact sets of parameters providing a useful, albeit 
naive framework to localize specific interactions onto atomic sites. 

The ambiguous nature of atom-centered charges has, therefore, prompted the development 
of alternative paths towards their determination [8] . The choice of the numerical scheme ought 
to be dictated by three prevalent criteria, namely (i) the computational cost of the derivation, 
(ii) the ease of implementation within the framework of a physical model and (iii) the ability 
of the point-charge model to reproduce properties of interest with the desired accuracy. Under 
a number of circumstances, crude atomic charges determined through inexpensive calculations 
are shown to be adequate. In other, more common scenarios, for instance, in molecular simula- 
tions of complex chemical systems, the accurate description of the electrostatic interactions at 
play can be of paramount importance. The atomic charges utilized in these simulations arc by 
and large derived from quantum-mechanical calculations carried out at a reasonably high level 
of theory, which in many cases, can be appreciably expensive. In the vast majority of popular 
potential energy functions, point-charge models are derived quantum-mechanically, following, in 
a nutshell, two distinct philosophies. On the one hand, the numerical simulations of condensed 
phases imposes that solute-solvent interactions be described as accurately as possible. Accord- 
ingly, in macromolecular force fields like Charmm [16], the atomic charges are determined based 
on a series of independent quantum-mechanical calculations featuring different relative positions 
of a solvent molecule around the solute. On the other hand, the electrostatic potential can be 
viewed as the fingerprint of the molecule, the accurate representation of which guarantees a 
reliable description of intermolecular interactions. In potential energy functions like Amber [9], 
point charges are derived from the molecular electrostatic potential, exploiting the fact that the 
latter is a quantum-mechanical observable readily accessible from the wave function. 

In their seminal article, Cox and Williams [10] proposed an attractive approach, whereby 
sets of atom-centered charges can be easily derived on the basis of a single-point quantum- 
mechanical calculation. The electrostatic potential is evaluated on a grid of M points lying 
around the molecule of interest, outside the van der Waals envelope of the latter. Restricting 
the multipolc expansion of the electrostatic potential to the monopole term, the charges borne by 
the n atomic sites of the molecule are determined by minimizing the root-mean square deviation 
between the reference, quantum-mechanical quantity and its zcroth-ordcr approximation — i.e. 
qiT®®, where T" ( ° = \\x-i — Xfc|| _1 , is the potential created at point k by atomic site i. In 
its pioneering form, the algorithm handled the least-squares problem iteratively. Chirlian and 
Francl subsequently proposed to resort to a non- iterative numerical scheme [7], which obviates 
the need for initial guesses and solves the ovcrdctcrmincd system of linear equations through 
matrix inversion. This route for the derivation of point-charge models can be generalized in a 
straightforward fashion to higher-order multipoles. 

The success of potential-derived charges stems in large measure from their ease of compu- 
tation and the demonstration for a host of chemical systems that they are able to reproduce 
with an appreciable accuracy a variety of physical properties. This success is, however, par- 
tially clouded by one noteworthy shortcoming of the method — point charges borne by atoms 
buried in the molecule cannot be determined unambiguously from a rudimentary least-squares 
fitting procedure. Symptomatically, for those molecular systems, in which the contribution of 
the subset of buried atoms to the electrostatic potential is ill-defined, the derived charges are 
in apparent violation with the commonly accepted rules of electronegativity differences, e.g. a 
C s ~ — Cl s+ bond polarity in carbon tetrachloride, in lieu of the intuitive C s+ — CI 5- . Bayly et 
al. tackled this issue through the introduction of hyperbolic penalty functions in their fitting 
procedure [3]. Arguably enough, this numerical scheme addresses the symptom rather than its 
actual cause. As was commented on by Francl et al. in the light of singular- value-decomposition 
analyses [13], the matrices of the least-squares problem are rank deficient, to the extent that 
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statistically valid charges cannot be assigned univocally to the selected set of atoms in the 
molecule. 

To delve further into this issue, Chipot et al. proposed an alternative algorithm coined 
statistical analysis of distributed multipoles (SADM) [6], wherein atom-centered multipoles are 
also derived from the quantum-mechanical electrostatic potential, yet following a somewhat 
different pathway than the conventional least-squares scheme. Instead of solving directly the 
n x M overdetermined system of linear equations, for instance through matrix inversion, a 
subset of n points is drawn amongst the M points of the grid and the corresponding n x n 
system of linear equations is solved. This procedure, referred to as an experiment, is repeated 
with different subsets of grid points, from whence probability distributions are obtained for 
the series of multipoles being sought. Strictly speaking, each probability distribution ought to 
be determined from independent experiments. On account of the computational burden, 
however — viz. typically, for a molecule formed by ten atoms and a grid of 2,000 points 
sampling the three-dimensional space around it, this would imply solving approximately 2.76 
x 10 26 systems of linear equations — only 3-5 x 10 5 independent experiments are performed, 
which has proven hcuristically to be appropriate. 

The mathematical description of this problem is the following: denoting by (?j)" =1 the 
unknown charges borne by the n particles, and by Jj(x) = \\x—Xj\\ , the electrostatic potential 
associated with each Xj € R 3 , the least square problem consists in finding the minimum G 
R" of the function 

M n 

R"9a^^|/(y 4 )-E a ^^)| 2 ' ( L1 ) 

»=1 3=1 

where (yj)fL 1 £ R 3M are the coordinates of the M external points. Here f(yj) stands for the 
approximation of the electrostatic potential at y 3 obtained by quantum-mechanical calculations. 

Instead of solving directly the problem (1.1), the SADM consists in drawing n points yW 
amongst the M points yj, solve the n x n problem f(y^) = lj{v ) a j> i = 1, ■ ■ ■ ,n in 

the least squares sense and subsequently plot the distribution of each a,j. In [6], Chipot et al. 
notice that the latter are Cauchy-likc distributions (with seemingly infinite expectation) centered 
around the exact solution of the original least-squares problem. Note that this method not only 
provides a numerical approximation of the solution, but also a global statistical distribution 
that reflects the accuracy of the physical model being utilized. 

Interestingly enough, it turns out that this kind of approach can be extended to many situa- 
tions arising in computational mathematics and physics. The principle of the SADM algorithm 
is in fact very general, and can be adapted to derive efficient algorithms that are robust with 
the dimension of the underlying space of approximation. This in turn provides new numerical 
methods of practical interest for high dimensional approximation problems, where traditional 
least-squares methods are impossible to implement, either because of the high dimensionality 
or the ill-posedness of the least-squares problem. 

The goal of the present contribution is twofold: 

• Introduce a general mathematical framework, and analyze the consistency, convergence 
and cost of the proposed algorithms in an abstract setting and in specific situations where 
calculations can be made explicit (Wishart or subgaussian distributions). The main out- 
come is that the subsystems drawn from the original system have to be chosen rectangular 
and not square (as initially proposed in the SADM method) to yield convergent and effi- 
cient algorithms. In other words, instead of drawing n x n subsystems, we will show that 
in many cases of applications, it is more interesting to draw n x n + 2 or n x 2n subsystems 
in order to control the expectation and variance of the distribution. 

• Apply these results to revisit and improve the SADM method. This is mainly achieved in 
Section 5 by considering a simple, three-point charge model of water. 
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2 Mathematical setting 

Let us now describe more precisely the problematic. 



2.1 General least-squares problems 

Let (CI, /x) be a probability space Q equipped with a probability measure \x. For a given arbitrary 
function / G L 2 (f2) and n given functions Jj(x) G L 2 (f2), j = 1, . . . ,n all taking values in R, 
we consider the problem of approximating f(x) by a linear combination of the functions Jj(x), 
j = 1, . . . ,71. 

Ideally, we would like to solve the problem of finding a — (<x/)™ =1 G R n , minimizing the 
function 

n 

R n 5a^\\f(x)-J2*nj(x)\\l HQ y (2-1) 

The actual quality of the least-squares approximation is given by the size of the residue 
\\p(a)\\ L2(n) where for a = ( fflj -)? =1 G M n , 

p(a)(a;)=/(a:)-5^o J -7 i (x). (2.2) 
j'=i 

Many minimization problems arising in mathematics and in physics can be stated under this 
form, for instance: 

(a) Q = [a, b] n with two real numbers a and b > a, and equipped with the measure dfi(x) ~ 
(b — a)~ n dx where dir is the Lebesgue measure on R™. Taking 7 : ft — > R n+1 defined by 
7i(x) = Xi for all i G {1, . . . , n} and 7 n +i = 1, the problem is equivalent to finding /3 G M 
and a G K n minimizing the function 



\\f(x)-0- (a,x)\\ L2([ab]r , 



where ( • , • ) is the standard Euclidean product in R™. This is nothing else than a multi- 
variate linear interpolation. 

Similarly, any polynomial approximation problem in L 2 ([a, b] n , /i), where /i is a weight 
function, can be written in the form (2.1) by taking as jj a basis of polynomials in dimen- 
sion n. 

(b) Taking D, = R™ equipped with a given n-dimensional Gaussian measure leads to many 
different situations: The approximation by Hermite functions in R™ if jj are polynomials, 
the approximation of / by Gaussian chirps signal [18] in the case where jj(x) are oscillating 
functions of x, or alternatively approximation by Gaussian wavepackets functions [15] in 
the context of molecular dynamics. 

(c) Consider D, — {1, . . . , M} with M 3> n equipped with the uniform probability measure 
M~ x J2i=i I R this case, an application / is represented by a vector b G R M , whereas 7 
is represented by a matrix A with n columns and M lines. The problem is then equivalent 
to the problem of finding a G R n that minimizes 

Ua-bf 2 

where || • || is the Euclidean norm on R M . This corresponds to the case described in (1.1). 
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(d) Consider fl = R" x f2' equipped with the measure \i ® v where \x and v are probability 
measures on M" and O' respectively. Taking f(x,uj') = h(x) + X(ui') where X(us') is a 
given random variable on f2', and 7j(x,u/) = for j = 1, . . .,n yields the problem of 
minimizing 



min E 



\\(a,x) - f(x,cj')\\l, {Rn) (2.3) 



which corresponds to the linear regression of a function observed with some independent 
noise. 

The problem (2.1) is equivalent to solving the linear equation 

(7,7 T >L 2 a = (7,/)l 2 

where a = (ai)™ =1 and (7,7 T )l2 is the n x n matrix with coefficients (7^,7^)1,2, i,j = l,...,n. 

If the family (7i(x))™ =1 defines a full rank set of elements of L 2 (f2), the matrix (7,7 T )l2 is 
invertible, and the solution of the previous equation reads 

" = (7,7 T )l2 1 -(7,/)l- (2.4) 

Apart from specific situations, where, for instance, the 7, can be assumed orthogonal, the 
numerical approximation of (2.4) is extremely costly with respect to the dimension of fi (see for 
instance [4]). Typically, discretizations of problems of the form (a) yields a problem of the form 
(c) with m = N n where N is the number of interpolation points in [a, b] needed to approximate 
the L 2 integrals. For n = 30, this method is not tractable in practice, even if iV = 2. 

To avoid this curse of dimensionality, an alternative would consist in approximating the 
integrals in the formula (2.4) by using Monte Carlo methods. In large dimension, the matrix 
(7, 7 T )l 2 is, however, often ill-conditioned, and obtaining a correct approximation of the inverse 
of this matrix might require in practice a very large number of draws to minimize the error in 
the value of a. 



2.2 Principle of the algorithm 

In this abstract mathematical setting, the principle lying behind the SADM method can be 
extended to the following: Retaining the idea of drawing subsystems of the original problem, 
we consider the following algorithm: 

• Draw m points i = 1, . . . , m in Q independent and identically distibuted (i.i.d.) with 
distribution \x. 

• Solve the m x n least-squares sub-problem by determining /3 minimizing the function 

m m 

K"3^^ - £ P 3l] {X (€) )\ 2 - (2.5) 

»=i j=i 

• Approximate the expectation /3 of the random variable /3 by a Monte-Carlo method and 
analyse its distribution. 

More precisely, we define X := (X^, . . . , X^) and the functions F : Q. m ->■ « n and 
r : n m ->■ «5f(M m ,R") by the formulae 

Vz = l,...,m, F,^ 1 ',...,^" 1 ^/^) (2.6) 

and 

Vi = l,...,m, Vj = l,...,n, T l3 {x^ 1 \... 1 x^)= lj {x^). (2.7) 
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The random vector (3 then minimizes the function 

P*\\F{X)-T{X)f}tf 3 , 

where || • || is the standard Euclidean norm on M. m . 

Under the assumption that T T (X)T(X) is invertible almost surely (a.s.), 

p = R(x)F(x) := ((r T r)- 1 r T )(x)F(x). (2.8) 

The expectation of j3 is then given by the formula 

,3:=E/3 = f ((r T r)- 1 r T F)(x( 1 ),...,i (m) )d* (1) )®'"®^(i (ra) ). (2.9) 

Our algorithm consists in using the Monte-Carlo method to compute the previous expectation: 
we approximate j3 by 

1 N 

i=l 

where fli,i> 1 are i.i.d. realizations of the random vector j3 £ R n , obtained by (2.8) from i.i.d. 
realizations of the random n x to matrix T(X). 

Of course, one expects that j3 should converge to the solution of the least square problem 
(2.4) when to — > +oo. This indeed can be easily proved under the additional assumption that 
/ and 7j, 1 < j < n belong to L 2 (f2). By the strong law of large numbers, 

1 1 m 

-(r(xfrpo) y - = -^ 7j (iW) 7j (iW) (2.ii) 

to m ' 

fc=i 

converges P-a.s. to ((7,7 T )L 2 )y when m — > +oo. Similarly, 

-(TiXfFiX)), = -jr 7i (xW)/(xW) (2.12) 
to m z — ' 

fe=l 

converges P-a.s. to ((7, /)l 2 )i- Consequently, if the matrix (7,7 T )l 2 is invertible, 

/3= f-r T (x)r(x)V 1 -r(x) T F(x) (2.13) 

Vm /to 
converges P-a.s. to a given by (2.4) when to — > +00. 

However our goal is not to analyse more finely this convergence, as we are concerned with 
situations where the least square problem (2.1) is ill-posed or computationally unfeasible due 
to the high diemsnion of the problem. In the opposite, considering the case where to is small in 
comparison with the dimension of f2 (M in the case of SADM) should reduce the computational 
cost, provided that the efficiency of the Monte-Carlo approximation is good. To express the 
fact that we are in a regime where to is small, we assume in the following that m < Cn 
for some constant C (typically m = n + 2 or m = 2n for practical applications) . 

Therefore, to make sure that the previous algorithm is efficient, we have to verify the following 
points: 

(i) The random variable (3 has finite expectation and variance. Here the bounds may depend 
on n, but not on the cardinal of f2 (M in the SADM description above). This condition 
is crucial to ensure the convergence of a Monte-Carlo method and the approximability 
of /3. In addition, the smaller is the variance, the faster the Monte-Carlo approximation 
converges to /3. 



6 



(ii) The average f3 is a good alternative to the solution of the original problem (2.1) in the 
sense that (3 — a — ^'(llp(a)ll ) where p(a) is the residue (2.2). In other words, if / is 
close to a linear combination of the functions 7 the residue will be small and the standard 
least-square approximation will be efficient. In this situation, /3 will also lead to a good 
approximation, and be close to the solution a. On the other hand, when the residue 
is large, j3 and a may differ, but in this situation the approximation of / by a linear 
combination of the functions of 7 is poor in any case. 

In Section 3 we give various conditions that warrant the latter requirements. In particular, 
we study the consistency of the algorithm, give conditions ensuring the convergence of the Monte 
Carlo method, and analyze the computational cost. In the specific instance where T(X) T T(X) 
has the Wishart distribution, all computations can be made explicitly, and we obtain precise 
estimates and an optimal choice of the parameter m. The two values m = n + 2 and m = 2n 
are of specific interest in this situation. In addition, we prove that the choice m — n leads to a 
random variable ft with infinite expectation, which partly explains the Cauchy-like distributions 
observed in [6] with the SADM method. 

2.3 The algorithm in the non-invertible case 

In practice, the almost sure invertibility of T T (X)T(X) cannot be guaranteed — and obvi- 
ously not for problems of the form (c), where all the random variables X^\ . . . , X^ may be 
equal with positive probability. 

In a more general setting, we, hence, restrict ourselves to realizations of X, such that matrix 
T(X) is sufficiently well conditioned, in the following sense: Denoting by si(r(X)) the smallest 
eigenvalue of the symmetric positive matrix T(X) T T(X), we only consider realizations of X, 
such that si(r(Jf)) is greater than some threshold cr, which may depend on n and m. In this 
case, rather than approximating (2.9), we will estimate the conditional expectation 



where the /3f are obtained from a sequence of i.i.d. realizations of the random vector j3 £ 1" 
in (2.8), from which have been removed all realizations such that si(T(X)) < a. Note that (2.10) 
is a particular case of (2.15) for a — 0, provided that P(si(r(X)) = 0) = 0. 

Again, such a method will be of interest in terms of computational cost if m is on the order 
of magnitude of n (in all the applications considered herein, m — n + 2 ot m — 2n will be 
sufficient) and if P(si(r(X)) > cr) is not too small — because drawing a realization of X such 
that si(r(X)) > a requires an average number P(si(r(X)) > cr) -1 of realizations of X . 

From the perspective of precision, this method will perform well if the variance of /3 con- 
ditionally on {si(r(X)) > cr} has an appropriate behavior with respect to n and m, and if 
f3 a defined in (2.14) provides a good approximation of the solution of the original least-squares 
problem. 

The specific case where the T(X) T T(X) is not a.s. invertible is studied in Section 4, where 
we give various conditions that warrant the latter requirements. The instance where T(X) has 
subgaussian entries (which covers the Wishart case mentionned above) is then studied in more 
details and leads again to optimal choices of m, N and a. 



f3 a := W(3 



E[/3 I S i(r(X)) > a] 



(2.14) 



by 




(2.15) 
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3 The invert ible case 

In all this section, we assume that the matrix T(X) T T(X) is a.s. invertible. 



3.1 Preliminary results 

Before studying the algorithm of Section 2.2, let us define for q E [2, +oo] 

2 

(3.1) 

where T(X) is the random matrix defined by (2.7) and with the usual convention that K^T) = 
||si(r(X)) -1 ||L°°. Note that K q (T) depends on n and to. 
The proof of the next lemma is given in Appendix A. 

Lemma 3.1 Let p £ [1, oo] and g £ h p (Q). Let us define the function G from g as F is defined 
from f in (2.6). Let also R(X) be the random matrix defined in (2.8). 

(a) Assume that K q (T) < +oo where q is such that q^ 1 + p^ 1 = 1. Then we have 

E\\R(X)G(X)\\ 2 < y/EmjK q (p)\\g\\ tf(fl) . (3.2) 

(b) Assume that p £ [2, oo] and that K q (T) < +oo where q is such that 2g _1 + 2p~ 1 = 1. Then 

we have 

n\R(X)G(X)\\ I < nm 2 K q {T)\\g\\ ^ (n) . (3.3) 

The next result is a first consequence of this lemma. We recall the definition of /? in (2.9) 
and that p{a) denotes the residue (2.2) associated with the function / and the coefficients a,j, 
j = 1, . . . ,n. 

Proposition 3.2 Let a = (aj)™ =1 £ R" and to < Cn for some constant C. Assume that 
p(a) £ h p (fl) and K q (T) < +oo for some p £ [1, +oo] and with q^ 1 +p~ 1 = 1. Then there exists 
a constant C(n) depending on n such that 

E||/3- a|| 2 <C(n)\\p(a) || LP(n) . (3.4) 

Proof. By definition of R{X), and as T(X) T Y(X) is invertible, we have 

R(X)T(X)a = a. 

Hence 

/3 - a = R{X)F(X) - R(X)T(X)a = R(X)p(a)(X). (3.5) 

where p(a)(X) is defined from p(a) as F was defined from / in (2.6). The result then follows 
from Lemma 3.1 (a) with C(n) — ^frvm^/ K q (T). ■ 



K q (T) := 



E ; 

8l T(X ))i 



3.2 Average and variance 

The following result is an immediate consequence of Prop. 3.2. It gives conditions on / and 
r to ensure that the random variable f3 has finite expectation, and thus that the Monte Carlo 
approximation a.s. converges to /3 when N — > +oo. 

Corollary 3.3 Let m < Cn for some constant C and assume that f £ L p (f2) and K q (T) < +oo 
for some p G [l,+oo] and with q^ 1 + p^ 1 = 1. Then there exist a constant C(n) depending on 



n 



such that 



E||/3|| 2 <C(n)||/|| LP(n) 



S 



In order to estimate the convergence rate of algorithm, we need to construct confidence 
regions with asymptotic level (less than) r\ for the Monte Carlo approximation of fa We are 
going to consider confidence regions of the form [ai,£>i] x . . . x [a n ,b n ], by taking each [cij,&j] 
as a confidence interval of asymptotic level n/n for the i-th coordinate fa of fa Note that more 
precise asymptotic confidence regions exist — see for instance [1] — but the previous confidence 
region is more convenient for computation. Note also that non-asymptotic estimates could be 
obtained using Berry-Essen-type inequalities — see for instance [19]. 

This leads to the choice 

&i - Oi = 2x(n, r))^Va,T(fa)/N, Vie{l,...,n} 
where N is the number of draws in (2.10), and where x(n, rj) > is the solution of 

-i />+oo 

1 ' e- u2/2 du = (3.6) 



V^TT Jx(n,r)) 2n 

In this case, the Euclidean diameter of the confidence region is bounded by 



2x(n,rj)y/Tr(Gov(f3))/N, (3.7) 

where Cov(/3) is the covariance matrix of the random vector fa defined by 

Cov(/3) := E[(/3 - Ep)(fi - E/3) T }. 

The next result gives bounds on the quantity Tr(Cov(/3)), which, in view of (3.7), controls 
the rate of convergence of the Monte-Carlo approximation. 

Proposition 3.4 Letm < Cn for some constant C and assume that p(fa) € L p (£!) and K q (F) < 
+oo for some p £ [2, +oo] and with 2p~ 1 + 2q~ l = 1. Then there exist a constant C(n) depending 
on n, such that 

Tr(Cov(fa)) <C(n)\\p(f3)\\l Pin y (3-8) 
Proof. Let g = p(fa) and define G from g as F is defined from / by (2.6). Then 

Tr(Cov(/3)) = E||/3 - fa\\ = E\\R(X)G(X)f 2 . 
The result, hence, follows from Lemma 3.1 (b) with C(n) = nm 2 K q {T). ■ 

These results show that the convergence of our algorithm relies on an assumption of the 
form K q (T) < +oo, which corresponds to the finiteness of a negative moment of the random 
variable si(r(X)). Such an assumption is clearly problem-dependent and has to be checked in 
each specific problem considered. Conditions ensuring this property when q < +oo are given in 
Appendix B and are used to handle the specific case of Wishart matrices in Subsection 3.5. 

Note that, under the assumptions of this section, the condition K q (T) is unlikely to be 
satisfied when q = +oo. Indeed, since T T (X)T(X) is assumed a.s. invertible, the measure /i 
must have no atom, and hence 51 is continuous (i.e. not denumerable) . If we assume in addition 
that the functions 7^ are regular on f2, so are the eigenvalues of T T (x)T(x) as a function of 
x = (x^ , . . . , x( m )) <E f2 m . Since the smallest eigenvalue is when x^ = . , . = x^ , we deduce 
that P(si(r(X)) < rj) > for all r\ > 0, which means that K 00(F) = 00. The way to handle the 
case q = 00 is explained in Section 4. 
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3.3 Link with the least square approximation 

Formula (2.9) proposes an alternative solution j3 to the solution a given by (2.4) of the least- 
squares problem (2.1). We now provide estimates between these two solutions. 

A precise error estimate depends on the tackled problem (see for instance Section 3.5). Here, 
we give a general result. 

Proposition 3.5 Assume that /, 71, . . . , j n belong to L 2 (fl) and that Kz(T) < +00. Then there 
exists a constant C(n) such that 

\\0-a\\ 2 <C(n)\\p(a)\\ L2{uy (3.9) 

Proof. Observing that p(a) € L 2 (f2), this is an immediate consequence of Prop. 3.2 and of 
the inequality \\f3 — a\\ < E\\/3 — a\\ . ■ 

In other words, the better / can be approximated by a linear combination of the functions 7^, 
1 < j < n, the closer the result of our algorithm is from the actual least square approximation. 

3.4 Computational cost of the algorithm 

Let e be a required precision for the approximation of /3 = E/3 by the Monte Carlo simulation 
(2.10). For large N, using (3.7), we must take 

N ~ 4sc(n, ?7) 2 £- 2 Tr(Cov(/3)). 

Since, for all x > 0, 

+00 1 r+oo -x 2 /2 

e' h 1/2 du <- ue- u ' 2 du = , (3.10) 

X J x X 

we deduce from (3.6) that, for n/rj large enough, 

x(n,ry) 2 <(21og^I) (3.11) 

In addition, each step of the algorithm requires to evaluate the matrix T{X) T T{X) and the 
vector T(X) T F(X) and to invert the matrix T(X) T T(X). The cost of these operations is of 
n 2 m. 

Hence, we see that the cost of the algorithm is of order 



order C™ 2 



Ce~ 2 mn 2 lognTr(Cov(/3)). 

Under the hypothesis of Proposition 3.4 and using the explicit expression of C(n) obtained in 
the proof of this proposition, the computational cost can be bounded by 

Ce~ 2 m 3 n 3 \ognK q (r)\\p(P)\\l Pm (3.12) 

for 2p- x + 2q~ 1 = 1. 

It may be observed that this cost depends only on n and m — and not the dimension of J7. 
Moreover, it depends on the least-squares residue of the problem (2.1). In the event where / is 
close to a linear combination of the functions jj, the algorithm is, therefore, cheaper (and, by 
Prop. 3.5, more precise). As a consequence, the cost of our algorithm is driven by the quality 
of the original least-squares approximation in Problem (2.1). 
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3.5 The Wishart case 



Let us now consider the case where f2 = R n , 

dn(x) = (27r)-™/ 2 cxp(-||.T||2/2)d.Ti...dx„ 

and 7j(x) = Xj for j £ {1, . . . , n} — i.e. linear interpolation. In this case, the random vectors 
X® are standard n-dimensional Gaussian vectors, the matrix T(X) is a m x n matrix with 
i.i.d. standard Gaussian entries and the law of the matrix T{X) T T(X) is the so-called Wishart 
distribution — see e.g. [1]. 

The joint distribution of its eigenvalues is known explicitly and can be found for example 
in [1, p. 534]. In particular, T{X) T T{X) is a.s. invertible if m > n. The explicit density of 
the eigenvalues has been used to obtain estimates on the law of the smallest eigenvalue of such 
matrices in [11, 12, 5]. These results allow us to obtain explicit estimates in the Wishart case, 
proved in Appendix B. We shall restrict here to the case where / and p(f3) belong to L°°(fi), 
and we refer to Appendix B for further estimates. 

Under the previous assumptions, the conditions of Corollary 3.3 and Proposition 3.4 are 
satisfied for all m > n + 2. The computational cost is (asymptotically) minimal for the choice 
m = 2n and the corresponding computational cost is bounded by 

C£-Vlogn||p(^)||^ (3.13) 

for an explicit constant C independent of n, where e is the required precision of the algorithm. 
In addition, the consistency error of Proposition 3.5 is bounded by 

C'n\\p(a)\\ Loe 

for a constant C independent of n. 

We, hence, see that the values m = n + 2 and m = 2n are of specific interest in terms of 
convergence and computational cost. Although the Wishart case corresponds to very simple 
approximation problems, this result gives valuable clues about the way parameters should be 
chosen in our algorithm. These specific values of m are numerically tested in the example of the 
three-point charge model of water developed in Section 5, where improvements of the SADM 
method arc considered. 



4 The general case 

Let us now consider the general case where T(X) T T(X) is not assumed to be a.s. invertible. 

Fix a > 0. We denote by E CT (rcsp. Cov cr ) the expectation (resp. covariance matrix) 
conditionally on the event {si(r(A)) > er}. As an approximation of the solution of the least- 
squares problem, we will examine the conditional expectation 

P a =E a (/3). (4.1) 

As will appear below, our algorithm always converges for any a > 0. As in the invertible 
case, its performance relies on precise estimates on convergence, consistency and computational 
cost, given below. Optimal computations will then be detailed in the specific instance where 
the matrix T(X) has independent sub-Gaussian entries. 

4.1 Consistency, convergence and computational cost 

We first generalize Proposition 3.2: For all q € [1, +oo], let 

2 

1 



K°AT) 



E CT 

S1 {T{X))\ 



(4.2) 
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Proposition 4.1 Let dj, j — 1, . . . ,n &e n numbers <Zj. Assume that p(a) £ L p (f2) /or some 
p £ [1, +oo], i/ien 



||r — a|| 2 < E CT ||/5 — a|| 9 < ^gy^ 

where q is such that q^ 1 + p^ 1 = 1. 
Proof. Using the inequality 

- E||p(o)(X)||" 

E °'""°»W''^ P(si (r(.Y))> 1 .) 

in (A. 3), the proof is exactly the same as that put forth in Lemma 3.1 and Proposition 3.2. ■ 
Note that, by definition of E* 7 , for all q £ [1, oo], 

K°{T) < a' 1 . (4.3) 

In particular, taking a = in the previous result implies that conditional expectation (4.1) is 
always well defined for a > as soon as f £ L p (£l) for some p £ [1, +oo]. 

The following result generalizes Proposition 3.4 to the case where a > 0. Its proof is very 
similar to that of Proposition 3.4. We will, hence, omit it here. 

Proposition 4.2 Assume that the function p(J3 a ) £ L p (f2) for p £ [2, +oo]. We have 

2 

n in — 9 

T ^ C0 ^ ^ nSl (T(X))>a)V P XWW/tnWlw (4.4) 

where q is such that 2q~ 1 + 2p~ 1 = 1. 

Although the trivial inequality (4.3) always allows one to infer explicit bounds from the 
previous results, there are cases where optimal estimates on K° (r) are much better. Since 
our performance analysis relies heavily on precise estimates on K°(r), it is desirable to obtain 
conditions for better estimates. Such conditions arc given in Proposition B.2 in Appendix B, 
and will be used to handle the sub-Gaussian case described in the next subsection. 



We now consider the cost of the algorithm: Let /3^ <T - > denote a random variable having the 
law of P conditioned on {sx(T(X)) > a}. The cost of the algorithm is determined by 

• the number N of simulations of needed to ensure that the diameter of the confidence 
region for the Monte Carlo estimation of E(/3^ cr ^ ) ) = E CT (/3) = /3 " is smaller than a given 
precision e. To control this, we use the upper bound on the confidence region diameter 
given by (3.7), where 77 is the level of confidence of the approximation; 

• the average number of draws of the random variable X needed to simulate a realization 
of p( a \ which is l/P(si(r(X)) > a). Note that a draw corresponds to simulating a 
nm-dimensional random variable. 

• the computation of the n x n matrix T(X)T(X) T , which is of order n 2 m — all other 
computational costs, including the cost of the computation of si(L(A)) or the inversion 
of r(x) T r(x), are of a smaller order with respect to the dimension n of the problem, 
provided that m > n. 
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Consequently, the cost of the algorithm is bounded by 

CN¥(sx(T(X)) > a^inm + n 2 ™) 

for some constant C > 0. As 

N ~ 4x(n,r 1 ) 2 e~ 2 TT{Cov' J (/?)), 
because of (3.11), the cost can be bounded by 

Ce- 2 n 2 m log n¥( Sl (T(X)) > cr)" 1 Tr(Cov <T (/?)). 
Thus, if p(f3 a ) G L p (f2) for p € [2, +oo], because of Proposition 4.2, the cost is bounded by 
Cs- 2 n 3 m 3 \ognF( Sn (T(X)) > a)- 1 '! K°(T) ||p(^)||^ (n) 

for some constant C > 0, where 2g _1 + 2p _1 = 1. 

We, hence, can see that the choice of an optimal threshold a has to be balanced to optimize 
the ratio between K% (T) and the probability F(s n (T(X)) > a) at some appropriate powers. 

Again, explicit bounds may depend on the tackled problem. Hereafter, we develop the 
particular instance where T(X) is a matrix with independent sub-Gaussian entries. 

4.2 The sub-Gaussian case 

We recall that the convergence of the algorithm holds for any choice of it > 0. The goal of this 
section is to study the behaviour of the computational cost in the subgaussian case as a function 
of a and m. 

We consider the case where 51 = R™, 

dp(x) = <S>i =1 dv(xi) 

for some probability measure v on R, and Jj(x) = h(xj) for j G {1, . . . , n} for some function h 
on R. This is tantamount to the case of an approximation of the function / on 1" by a linear 
combination of functions depending on only one variable. 

In this case, it is clear that all the entries of matrix T(X) are i.i.d. Let us assume that these 
random variables are sub- Gaussian, i. e. 

Vf > 0, v{{x G E : \h{x)\ > t}) < 2exp(-t 2 /R 2 ) 

for some R > 0. Such is the case, in particular if h is bounded or if v has compact support 
and h is continuous on the support of v. Rudelson & Vershynin [20] have recently obtained 
estimates on the distribution of si(T(A)) in the subgaussian case, optimal in the sense that 
they are consistent with the explicit bounds in the Wishart case. 

Using these results, under the assumption that / and p{0) belong to L°°(f2) and taking 
a = an for some constant a > 0, computations in Appendix B prove that the optimal choice for 
m in terms of (asymptotic) computational cost is m = 2n, and we have the same estimates on 
the computational cost and the consistency as in Section 3.5. 

This shows that, choosing conveniently a, the computational cost has the same behaviour 
as is the Wishart case. In addition, the result in terms of computational cost in n appears to be 
relatively unaffected by the choice of a. In particular, the specific value of the constant a such 
that a — an only has an influence of the constant C in (3.13). 
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5 Improvement of the SADM method 



The statistical analysis of distributed multipoles (SADM) algorithm put forth in [6] corresponds 
to a problem of the form (c), where (<X;)" =1 represent the unknown multipoles borne by the 
n particles, and Jj(x) — l/\\x — Xj\\ the electrostatic potential functions, where xi,...,x n 
denote the positions of the particles. The space fi is made of M points in the three-dimensional 
Cartesian space, lying away from the atomic positions, with M >> n. 

However more computationally intensive than the least-squares scheme, this pictorial ap- 
proach provides a valuable information as to whether the atomic multipoles are appropriately 
defined, depending on how spread out the corresponding distributions are. For instance, descrip- 
tion of the molecular electrostatic potential of dichlorodifiuoromethane (CCI2F2) by means of a 
simple point-charge model yields a counterintuitive C s ~ — X 5+ bond polarity — where X = CI 
or F, blatantly violating the accepted rules of electronegativity differences. Whereas the least- 
squares route merely supplies crude values of the charge borne by the participating atoms, the 
SADM method offers a diagnosis of pathological scenarios, like that of dichlorodifiuoromethane. 
In the latter example, the charge centered on the carbon atom is indeterminate, as mirrored by 
its markedly spread distribution [6] . The crucial issue of buried atoms illustrated here in the 
particular instance of CCI2F2 can be tackled by enforcing artificially the correct bond polarity 
by means of hyperbolic restraints [3] . Violations of the classical rules of electronegativity differ- 
ences may, however, often reflect the incompleteness of the electrostatic model — e.g. describing 
an atomic quadrupole by a mere point charge. Addition of atomic dipoles to the rudimentary 
point-charge model restores the expected, intuitive C 5+ — X 5 ~ bond polarity [6]. 

In this section, we revisit the prototypical example of the three-point charge model of water. 
The molecular geometry was optimized at the MP2/6-311++G(c?,p) level of approximation. 
The electrostatic potential was subsequently mapped on a grid of 2,106 points surrounding the 
molecule, at the same level of theory, including inner-shell orbitals. All the calculations were 
carried out with the Gaussian 03 suite of programs [14]. Brute- force solution of the least- 
squares problem (2.1), employing the Opep code [2], yields a net charge of —0.782 electron- 
charge unit (e.c.u.) on the oxygen atom — hence, a charge of +0.391 e.c.u. borne by the two 
hydrogen atoms, with a root-mean square deviation between the point-charge model regenerated 
and the quantum- mechanical electrostatic potential of 1.09 atomic units, and a mean signed error 
of 51.1 %. This notoriously large error reflects the incompleteness of the model — a simple point 
charge assigned to the oxygen atom being obviously unable to describe in a satisfactory fashion 
the large quadrupole borne by the latter. 

On account of the space-group symmetry of water, only one net atomic charge would, 
in principle, need to be determined — the point charges borne by the two hydrogen atoms 
being inferred from that of the oxygen atom. Inasmuch as the SADM scheme is concerned, 
this symmetry relationship translates to a single equation to be solved per realization or exper- 
iment. Without loss of generality, two independent parameters will, however, be derived from 
the electrostatic potential, the point charges borne by the two hydrogen being assumed to be 
equal. Furthermore, in lieu of solving the individual C| 106 systems of 2 x 2 linear equations, 
incommensurable with the available computational resources, it was chosen to select randomly 
500,000 such systems. 

The running averages of the charge borne by the oxygen atom are shown in Figure 1 as a 
function of the number of individual realizations, for the SADM algorithm with n = N s points 
and its proposed enhancement, using 2, 4 and 8 additional grid points per realization — with the 
notations utilized in the previous section, the latter translates to m = N s + 2, N s + 4 and N s + 8. 
From the onset, it can be seen that the SADM scheme yields the worst agreement with the target 
value derived from the least-squares problem (2.1), and that inclusion of supplementary equa- 
tions to the SADM algorithm rapidly improves the accord. However minute, this improvement 
is perceptible as new grid points are added to the independent realizations. Equally perceptible 
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is the convergence property of the running average, reaching faster an asymptotic value upon 
addition of grid points. Congruent with what was established previously, the present set of re- 
sults emphasizes that the SADM method cannot recover the value derived from the least-squares 
equations. They further suggest that convergence towards the latter value will only be achieved 
in the limit where the number of added points coincides with the total number of grid points 
minus the number of parameters to be determined — i.e. one unique realization. 
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Figure 1: Running average of the point charge, Qoo, borne by the oxygen atom of water (N s = 
2 parameters) as a function of the number of independent realizations, wherein systems of 2 x 2 
(SADM), 4 x 2, 6 x 2 and 10 x 2 linear equations are solved. The thick, dark horizontal line at 
Qoo = —0.782 e.c.u. corresponds to the solution of the least-squares problem. 

Not too surprisingly, closer examination of the corresponding charge distributions in Figure 2 
reveals that as additional grid points are added to the individual realizations, not only does the 
width of these distributions narrow down, but the latter are progressively reshaped. As was 
conjectured in [6], the SADM algorithm yields Cauchy distributions, which is apparent from 
Figure 2. Improvement of the method alters the form of the probability function, now closer 
to a normal distribution. Interestingly enough, the slightly skewed shape of the distributions, 
particularly visible on their left-hand side — as a probable manifestation of the incompleteness of 
the electrostatic model, precludes perfect enveloping by the model distributions, either Cauchy- 
or Gaussian-like. 

Put together, the present computations reinforce the conclusions drawn hitherto, contra- 
dicting in particular the illegitimate assumption that the SADM and the least-squares solutions 
might coincide [6]. From a numerical standpoint, however, the results obtained from both strate- 
gies appear to be reasonably close, thereby warranting that the SADM algorithm should not 
be obliterated, as it constitutes a valuable pedagogical tool for assessing the appropriateness of 
electrostatic models. 



6 Conclusion 

In this work, a probabilistic approach to high-dimensional least-squares approximations has 
been developed. Originally inspired by the SADM method introduced for the derivation of 
distributed atomic multipoles from the quantum-mechanical electrostatic potential, this novel 
approach can be generalized to a wide class of least-squares problems, yielding convergent and 
efficient numerical schemes in those cases where the space of approximation is very large or 
where the problem is ill-conditioned. 
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Figure 2: Normalized distributions of the charge, Qoo, borne by the oxygen atom of water (N s = 
2 parameters) obtained from 500,000 independent realizations, wherein systems of 2 x 2 (SADM), 
4 x 2, 6 x 2 and 10 x 2 linear equations are solved (black curves). The light and dark curves 
correspond, respectively, to numerically fitted Cauchy and Gaussian distributions. 

This novel approach constitutes a marked improvement over the SADM method. Complete 
analysis of the numerical algorithm in general cases, in terms of both computational effort 
and optimal error estimation, relies on open and difficult issues prevalent to random matrix 
problems. 



A Proof of Lemma 3.1 

We denote by || • || the Schur-Frobenius norm on n x to matrices 

m n 

\\K =EE4- 

»=i j=i 

where A — (£Jij)i<j<n,i<j<m- With this notation, we have 

\\(A T A)- 1 A T f F = Tt((A t A)- 1 A t A(A t A)- 1 ) = Tr((A T A)^ 1 ) < 

In addition, for any v — . . . , v m ) € W n , we have 



81 



(AY 



(A.l) 



\Av\\ = ^ ^ aijVjdikVk 

i—1 l<j,k<7n 



< 



2 ] VjV k \ + ^2ai k \vjV k 



<\\mj v \\i\\ v \\o ^\\ a \\f\\ v K' 

where we used the inequality \bijbik\ < |(&fj + bf k ). 

With the notation of Lemma 3.1, using (A. 2) and (A.l), we have 

\\R(X)G{X)\\ 2 <V^s 1 (T(X))-^\\G{X)\\ 1 



(A.2) 
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Taking the expectation and using Holder's inequality, we get 

E\\R(X)G(X)\\ 2 < V^^K^)(^\\G{X)\\ P 1 ) 1 ' P . (A.3) 

Now, Y i-> {E\Y\Py/P defines a norm on the set of random vectors on Q with finite p-th-order 
moment. We, hence, obtain 

(lE||G(X)^) 1/f '=(E(^| 5 (X«" V " X " 



m 11.91 



Lp(n) 



i=l 

and this yields Lemma 3.1 (a). 

Lemma 3.1 (b) is obtained from similar computations. 

B Estimates on random matrices 
B.l On the finiteness of K q (T) 

As seen in Propositions 3.4 and 3.5, the convergence and the consistency of our algorithm rely 
on assumptions of the form K q {T) < +oo, where K q (T) is given by (3.1). These assumptions 
correspond to the finiteness of a negative moment of the random variable si(T(X)). The follow- 
ing proposition provides a condition on the distribution of si(T(X)) to ensure such integrability 
properties. 

Proposition B.l Let Y be a random variable satisfying the following estimate: There exist 
constants S > and 7 > such that 

Ve > 0, ¥(Y < e) < (5e) 7 . (B.l) 

Then, for any < r < 7, 

E{Y- r ) < -^- r . (B.2) 

1 — r/7 

Proof. The proof is based on the following integration by parts, where J °° h(x)d¥(Y £ [0,x)) 
denotes the Stieltjes integral of the measurable function h with respect to the Stieltjes measure 
on [0, 00) associated with the non-decreasing function x 1— > ¥(Y € [0, x)). 



E{Y- r ) = I 2T' r dP(y e [0,x)) 

rx'^^iY £ [0,x))dx 

OO 

< r I x- r - l {{5xy A l)dx. 







If r < 7, 

x-r-\(5xr A l)dx =-(—-), 







which entails (B.2). 
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Of course, the property (B.l) can be strongly problem-dependent. In general situations, 
this is related to difficult problems on random matrices, which, to our knowledge, have not 
been solved yet. However, explicit computations are possible in the specific instance where 
r(X) T T(X) has the Wishart distribution (see Subsections 3.5 and below). 

In the case where the matrix T T (X)T(X) is not a.s. invertible, the method described in 
Section 4 consists in taking expectations conditionally to {si(T(X)) > a} for some a > 0. A 
quantitative analysis of our method relies on estimates on K° (r) defined in (4.2) (see Propo- 
sitions 4.1 and 4.2). The following result generalizes Proposition B.l to the case where a > 0. 
Its proof is very similar to that of Proposition B.l. We will, hence, omit it here. 

Proposition B.2 Fix a > and assume that random variable Y satisfies the following esti- 
mate: There exist constants S and 7 such that 

Ve>CT, P(Y < e) < (Je) 7 . (B.3) 

Then, for any r ^ 7, if < a < 8 , 

This result is used to obtained explicit estimates on our algorithm in the case where the 
matrix T(X) has sub-Gaussian entries (see Sections 4.2 and B). 

B.2 Explicit estimates in the Wishart case 

The goal of this section is to prove the following result . 
Proposition B.3 In the Wishart case (see Section 3.5,), assume that f <E L p (i7) and p(J3) € 
L p (f2) with p > 2. Then, the convergence of the algorithm (in the sense that Tr(Cov(/3)) < +00, 
see Proposition 3. A) holds if 

p + 2 

m > n H . 

p-2 

In the case where p{fi) € h°°(Q), this condition corresponds to m > n + 2 and the computational 
cost of the algorithm is bounded by 

n 3 m 4 \ogn - 2 

(m — n + l)(m — n — 1) u 

where e is the required precision and the constant C is independent of n and m. The optimal 
value m* of m in the previous bound satisfies 

m* ~ 2n 

when n — > +00, and the corresponding computational cost is bounded by 

C'e- 2 n 5 logn\\p(P)\\l^. 
In addition, the consistency error of Proposition 3. 5 is bounded by 

C"n\\p(a)\L. 



18 



Our computations are based on the following estimate on the law of the smallest eigenvalue 
of Wishart matrices [5, Lemma 3.3], which reads with our notation as follows. For all m > n > 2, 
let 

k = m — n + 1 . 
The density p{x) of Si(T(X)) then satisfies 

L^e-^^x^- 1 < P (x) < L n>m e- x / 2 x%-\ Vx > 0, (B.5) 

where 

_ 2*- 1 $(^) 
n ' m ~ $(f)$(fc) ' (B ' 6) 
where $ is the Gamma function, defined for all x > by 

r+oo 

$(ar) = / e-H x ^dt. 



Lemma B.4 For all m > n> 2. the random variable Y — s\(T(X)) satisfies (B.l) for 

m — n + 1 k 
7 = 2 = 2 

ana o = e — ~. 

k z 

Moreover, the constant 7 above is the smallest such that (B.l) holds for all e > for some 
constant S. 

Proof. This result is based on the following bounds for the Gamma function [5, Lemma 2.7]. 

For all x > 0, 

V2tt x x+ ^e~ x < $(x + 1) = x$(x) < V2n x x+ ^e~ x+ ~^ . 

These inequalities can be plugged into (B.6) to get that, for all e > 0, 

77 2 1 ~ 1 $>( m ~ 1 + 1 1 r £ 

<T 2 v v 2 / 7 

n+l - ~ ■ ' a 



2vr(f) I ^-e-f ^(27)^+^-27 
2^7 (f)^ W 



Now, 



m— 1\7 l/e(m— 1)\7 

e 



Combining this inequality with the facts that m— 1 > 1 and 7 > 1/2 yields 
m / ,„„ rtl v e 1/6 /e 2 (m-l) \7 / e 2 m \1 

p( Sl (r ( x))< £ )<^=(^ i ^ E ) <(-^ £ ) . 

Because of (B.5), we have that p(x) ~ L„. m a; 7_1 as 1 -> 0. Therefore, one easily sees that 
7 = fc/2 is the minimal value of 7 for (B.l) to holds. ■ 

Using this result and Proposition B.l, we immediately obtain the following: 
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Proposition B.5 Let to > n be given and assume that the random matrix T(X) T T(X) associ- 
ated with the function T defined in (2.7) follows a Wishart distribution. Let q be such that 



l<q<k = m-n+l. (B.7) 

Then we have 

*. m <£(!-!)-' <B,) 

where K q is defined in (3.1). 

Combining this result and the result of Proposition 3.4, if p(fi) G L p (f2) with p > 2 , the 
convergence of the algorithm is ensured if K q < oo in (3.8) with 2p _1 + 2q~ 1 = 1. This means, 
(see (B.7)) 

2p 

2 < — — < m - n + 1 
p-2 

or equivalently 

^0 + 2 

m > n H . 

p-2 

Assume still that p{0) € L°°(f2). Using (B.8) with q = 2, it can be seen in view of (3.12) 
that the cost of the algorithm is bounded by 



Ce-*n*mHogn^(l-\) \\p(fi)\\l 



for some constant C independent of n and to. Using the notation 7 = fe/2, we can rewrite this 
cost in term of 7 as 

Ce 2 n 3 logn± / p09) 

7^7 — 1) L 

To determine the optimal choice of to, let us now try to find the optimal number 7 that minimizes 
this cost. The derivative of this expression with respect to 7 has the same sign as 

87(7 - 1) - (n + 27 - 1)(2 7 - 1) = 4 7 2 - 2(n + 2)7 + n - 1. 

Since this quantity is negative if 7 = 1/2, the only root of this polynomial greater than 1 is 
given by 

n + 2 + Vn 2 + 8 
7 = 4 ' 

which is the optimal choice of 7 in terms of computational effort. This yields an optimal choice 
to* ~ 27* + n — 1. Note that for large n, we have 7* ~ n/2 and m* ~ 2n. 

With this optimal choice, the computational cost of the algorithm can be written as 

C n e~ 2 ||p(/3)|| LOO with C„ — C?i 5 log n as n -> +00. 



Considering a similar calculation with (7 = 1, we can easily see that the consistency error of 
Proposition 3.2 for this choice of parameters can be bounded by 



C'||p(a)|| with C' n ~ C'n as n — > +00. 
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B.3 Explicit estimates in the sub-Gaussian case 

The goal of this section is to prove the following result. 
Proposition B.6 In the sub-Gaussian case (see Section A.2), assume that f £ h°°(Q) and 
p(/3) € L°°(f2). Then, there exists explicit constants A and B such that, if 



B 2 m 2 {^h~i - y/n — l) 2 „_ 2 Bm/(m-n+l) 



e 



A(m -n + 1) 2 
the computational cost of our algorithm is bounded by 

C ^7 ^T7^ idWflllL, CB-9) 

(to - n + l)(m - n - 1) " v " lL °° v 1 

where e is the required precision and the constant C is independent of n and to. Again, the 
optimal value to* of to in the previous bound satisfies to* ~ In as n — » +00. 
For suc/i a choice of to, we obtain 

a~C'n (B.10) 

/or an explicit constant C . 

With our notations, Theorem 1.1 of [20] writes as follows: there exist explicit constants A 
and B depending only on R such that, for all m > n and all e > 0, 

p(si(r(X)) < e(y/m- V^~T) 2 ) < (^ e )("-"+i)/2 + e -Bm_ ^ B11 ) 
Writing just like in Subsection B k for m — n + 1, it can be seen that 



( * 1 ( r ( .Y ) ) ; , . ( , f ) k +(e~ B ^ k ) 

- y TO - V 71 — 1 / 
. ( ^ I B -^* 

/to — \/n — 1 



Eq. (B.3), therefore, holds for Y = si(T(X)) and 

5 2 to 2 (v^- Vn - l) 2 _ 2jBm/fc 

ff - ,J0 - = fcM e ' 

(1 + k/Bm) 2 A 
(\/m — y/n — l) 2 

and 7 = |. 

Note that, since octo = (1 + Bm/k) 2 e~ 2Bm / k < 1, the inequality in (B.3) is not trivial and 
supplies some information on the law of si(r(X)). 

As in Subsection B, the inequality (B.4) can be combined with the results of Propositions 4.2 
and 4.1 to obtain a precise error estimate and convergence bounds in this case. 

Such computations are, however, cumbersome because the optimal choice of a cannot be 
determined explicitly. Taking a — o~q as in Proposition B.6 and under the assumption that 
p(/3 CT ) £ L°°(f2), because of Proposition B.2, the computational cost is smaller than 

„_ 2 3 , „ , 5o .v„2 m 3 5 ( 1 (8o)~ 1 - 1 \ 

Ce Vbgn||p(/nil L - (1 _ {Sap)2 (y-^ + T^j 
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If one assumes that (<To<5) 7 ->0asn-> +00, observing that 



^ _ (^/nl+ ^/n~^l) 2 (l + k/Bm) 2 A Cm 
d ~ P " ~P~' 

the cost is bounded from above by 

^Vlogn-^ll^)!!^ 

for 7 > 1. We recognize the same cost as in Subsection 3.5. The optimal choice of 7, therefore, 
behaves as n/2 as n — >• +00 — and for this choice we indeed have (<7o<$) 7 - ► 0, which validates 
the previous computation. Therefore, for this choice of parameters, the cost is bounded by 

Ce~ 2 n 5 logn||p(/3 CT )||^ oo 

for some constant C > 0. 

One can check that any other choice of a yields the same order in n as n — > +00 , should one 
choose 7 ~ n/2. 

It ought to be noted that these bounds do not allow one to pick a = 0. As far as we know, 
this seems to be an open and difficult question to prove that (B.ll) holds without the right- 
hand-side, additive term e~ Bm . In particular, it requires additional assumptions to hold — e.g. 
random variable h(Y), where Y has law v, has no atom, i.e. that v{{h = y}) = for all y € K 
(otherwise, the matrix T(X) could have m — n identical rows, and, thus, have a rank less than 
n, with non-zero probability). 
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